Overview

Dataset statistics

Number of variables10
Number of observations373992
Missing cells322056
Missing cells (%)8.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory53.9 MiB
Average record size in memory151.0 B

Variable types

NUM8
DATE1
CAT1

Reproduction

Analysis started2020-06-29 03:48:54.030697
Analysis finished2020-06-29 04:01:56.104487
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
$NO_2$ (µg/m3) has 50139 (13.4%) missing values Missing
$O_3$ (µg/m3) has 49341 (13.2%) missing values Missing
$PM_{10}$ (µg/m3) has 61198 (16.4%) missing values Missing
$SO_2$ (µg/m3) has 50131 (13.4%) missing values Missing
CO (ppm) has 61219 (16.4%) missing values Missing
NO (µg/m3) has 50028 (13.4%) missing values Missing
$SO_2$ (µg/m3) is highly skewed (γ1 = 38.94159293) Skewed
$SO_2$ (µg/m3) has 97269 (26.0%) zeros Zeros
CO (ppm) has 17324 (4.6%) zeros Zeros

Variables

Distinct count76704
Unique (%)20.5%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
Minimum2011-01-01 00:00:00
Maximum2020-12-01 23:00:00
Histogram

$NO_2$ (µg/m3)
Real number (ℝ)

MISSING
Distinct count147494
Unique (%)45.5%
Missing50139
Missing (%)13.4%
Infinite0
Infinite (%)0.0%
Mean18.490202998274462
Minimum-4.24
Maximum268.6440491107547
Zeros505
Zeros (%)0.1%
Memory size2.9 MiB

Quantile statistics

Minimum-4.24
5-th percentile2.617707789
Q18.912176728
median16.64744817
Q325.57470665
95-th percentile40.98787666
Maximum268.6440491
Range272.8840491
Interquartile range (IQR)16.66252992

Descriptive statistics

Standard deviation12.47942515
Coefficient of variation (CV)0.674920938
Kurtosis3.533978202
Mean18.490203
Median Absolute Deviation (MAD)9.742163372
Skewness1.194035645
Sum5988107.712
Variance155.7360521
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 505 0.1%
 
2.8 190 0.1%
 
3 187 0.1%
 
2.3 184 < 0.1%
 
3.5 181 < 0.1%
 
2.9 180 < 0.1%
 
2.7 173 < 0.1%
 
2.5 171 < 0.1%
 
2.2 171 < 0.1%
 
2 171 < 0.1%
 
Other values (147484) 321740 86.0%
 
(Missing) 50139 13.4%
 
ValueCountFrequency (%) 
-4.24 1 < 0.1%
 
-3.01 1 < 0.1%
 
-2.83 1 < 0.1%
 
-2.76 1 < 0.1%
 
-2.74 1 < 0.1%
 
ValueCountFrequency (%) 
268.6440491 1 < 0.1%
 
204.2906119 1 < 0.1%
 
165.6963269 1 < 0.1%
 
157.4632923 1 < 0.1%
 
153.2053251 1 < 0.1%
 

$O_3$ (µg/m3)
Real number (ℝ)

MISSING
Distinct count81427
Unique (%)25.1%
Missing49341
Missing (%)13.2%
Infinite0
Infinite (%)0.0%
Mean18.4134798690639
Minimum-3.7
Maximum191.6
Zeros1426
Zeros (%)0.4%
Memory size2.9 MiB

Quantile statistics

Minimum-3.7
5-th percentile2.383906953
Q18.217364978
median15.00925263
Q324.56231105
95-th percentile46.45945306
Maximum191.6
Range195.3
Interquartile range (IQR)16.34494607

Descriptive statistics

Standard deviation14.32427863
Coefficient of variation (CV)0.7779234957
Kurtosis4.848383693
Mean18.41347987
Median Absolute Deviation (MAD)10.68523985
Skewness1.698239788
Sum5977954.653
Variance205.1849582
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1426 0.4%
 
19.36820278 265 0.1%
 
18.97692596 248 0.1%
 
17.41181867 242 0.1%
 
17.21618025 237 0.1%
 
20.93331008 234 0.1%
 
18.58564914 233 0.1%
 
18.39001072 230 0.1%
 
19.75947961 229 0.1%
 
18.78128755 228 0.1%
 
Other values (81417) 321079 85.9%
 
(Missing) 49341 13.2%
 
ValueCountFrequency (%) 
-3.7 1 < 0.1%
 
0 1426 0.4%
 
0.1 8 < 0.1%
 
0.1915285258 1 < 0.1%
 
0.1918425842 1 < 0.1%
 
ValueCountFrequency (%) 
191.6 1 < 0.1%
 
177 1 < 0.1%
 
174.2 1 < 0.1%
 
173.2 1 < 0.1%
 
173.0224138 1 < 0.1%
 

$PM_{10}$ (µg/m3)
Real number (ℝ≥0)

MISSING
Distinct count1773
Unique (%)0.6%
Missing61198
Missing (%)16.4%
Infinite0
Infinite (%)0.0%
Mean23.38087846953586
Minimum0.0
Maximum969.4
Zeros1301
Zeros (%)0.3%
Memory size2.9 MiB

Quantile statistics

Minimum0
5-th percentile4.6
Q111.8
median20.1
Q330.5
95-th percentile52.6
Maximum969.4
Range969.4
Interquartile range (IQR)18.7

Descriptive statistics

Standard deviation17.94502601
Coefficient of variation (CV)0.7675086302
Kurtosis109.3798404
Mean23.38087847
Median Absolute Deviation (MAD)12.12191792
Skewness5.063181606
Sum7313398.5
Variance322.0239584
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1301 0.3%
 
12.5 1131 0.3%
 
14.3 1106 0.3%
 
15.5 1102 0.3%
 
6.8 1101 0.3%
 
11 1097 0.3%
 
16.3 1095 0.3%
 
8 1094 0.3%
 
8.3 1090 0.3%
 
12.8 1089 0.3%
 
Other values (1763) 301588 80.6%
 
(Missing) 61198 16.4%
 
ValueCountFrequency (%) 
0 1301 0.3%
 
0.1 70 < 0.1%
 
0.2 71 < 0.1%
 
0.3 217 0.1%
 
0.4 88 < 0.1%
 
ValueCountFrequency (%) 
969.4 1 < 0.1%
 
878.4 1 < 0.1%
 
832.9 1 < 0.1%
 
656.8 1 < 0.1%
 
598 1 < 0.1%
 

$SO_2$ (µg/m3)
Real number (ℝ)

MISSING
SKEWED
ZEROS
Distinct count20758
Unique (%)6.4%
Missing50131
Missing (%)13.4%
Infinite0
Infinite (%)0.0%
Mean2.310778937240057
Minimum-10.5
Maximum1101.5384382999832
Zeros97269
Zeros (%)26.0%
Memory size2.9 MiB

Quantile statistics

Minimum-10.5
5-th percentile0
Q10
median0.7777908106
Q32.083456723
95-th percentile10.70664776
Maximum1101.538438
Range1112.038438
Interquartile range (IQR)2.083456723

Descriptive statistics

Standard deviation8.941202146
Coefficient of variation (CV)3.869345528
Kurtosis2698.673818
Mean2.310778937
Median Absolute Deviation (MAD)2.691232364
Skewness38.94159293
Sum748371.1774
Variance79.94509582
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 97269 26.0%
 
0.5208641808 2363 0.6%
 
0.7812962712 2261 0.6%
 
0.2604320904 2231 0.6%
 
0.2606475499 2033 0.5%
 
0.2611202188 1992 0.5%
 
1.041728362 1940 0.5%
 
0.2600073888 1921 0.5%
 
0.2611377503 1883 0.5%
 
0.5200147776 1801 0.5%
 
Other values (20748) 208167 55.7%
 
(Missing) 50131 13.4%
 
ValueCountFrequency (%) 
-10.5 1 < 0.1%
 
-9.5 1 < 0.1%
 
-6.9 1 < 0.1%
 
-6.6 1 < 0.1%
 
-4.3 2 < 0.1%
 
ValueCountFrequency (%) 
1101.538438 1 < 0.1%
 
863.4 1 < 0.1%
 
830.2925058 1 < 0.1%
 
798.1878327 1 < 0.1%
 
769.8 1 < 0.1%
 

CO (ppm)
Real number (ℝ)

MISSING
ZEROS
Distinct count467
Unique (%)0.1%
Missing61219
Missing (%)16.4%
Infinite0
Infinite (%)0.0%
Mean0.35002592934812143
Minimum-0.22
Maximum7.66
Zeros17324
Zeros (%)4.6%
Memory size2.9 MiB

Quantile statistics

Minimum-0.22
5-th percentile0
Q10.14
median0.29
Q30.49
95-th percentile0.87
Maximum7.66
Range7.88
Interquartile range (IQR)0.35

Descriptive statistics

Standard deviation0.3062733398
Coefficient of variation (CV)0.8750018615
Kurtosis22.42345833
Mean0.3500259293
Median Absolute Deviation (MAD)0.2211982598
Skewness2.797453125
Sum109478.66
Variance0.09380335864
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 17324 4.6%
 
0.01 5979 1.6%
 
0.14 5600 1.5%
 
0.15 5558 1.5%
 
0.17 5542 1.5%
 
0.11 5408 1.4%
 
0.12 5405 1.4%
 
0.16 5390 1.4%
 
0.18 5367 1.4%
 
0.13 5365 1.4%
 
Other values (457) 245835 65.7%
 
(Missing) 61219 16.4%
 
ValueCountFrequency (%) 
-0.22 1 < 0.1%
 
-0.21 2 < 0.1%
 
-0.2 1 < 0.1%
 
-0.19 2 < 0.1%
 
-0.17 2 < 0.1%
 
ValueCountFrequency (%) 
7.66 1 < 0.1%
 
6.26 1 < 0.1%
 
6.09 1 < 0.1%
 
6.05 1 < 0.1%
 
5.99 1 < 0.1%
 

NO (µg/m3)
Real number (ℝ)

MISSING
Distinct count182899
Unique (%)56.5%
Missing50028
Missing (%)13.4%
Infinite0
Infinite (%)0.0%
Mean30.27803409251523
Minimum-1.7
Maximum684.9544090228213
Zeros1770
Zeros (%)0.5%
Memory size2.9 MiB

Quantile statistics

Minimum-1.7
5-th percentile0.7325601385
Q13.484188223
median15.75545285
Q339.55759935
95-th percentile112.3550282
Maximum684.954409
Range686.654409
Interquartile range (IQR)36.07341112

Descriptive statistics

Standard deviation43.38333828
Coefficient of variation (CV)1.432832071
Kurtosis14.26158041
Mean30.27803409
Median Absolute Deviation (MAD)27.851887
Skewness3.212817958
Sum9808993.037
Variance1882.11404
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 1770 0.5%
 
0.2 900 0.2%
 
0.3 851 0.2%
 
0.1 801 0.2%
 
0.4 763 0.2%
 
0.5 694 0.2%
 
0.6 630 0.2%
 
0.7 543 0.1%
 
-0.1 493 0.1%
 
0.8 410 0.1%
 
Other values (182889) 316109 84.5%
 
(Missing) 50028 13.4%
 
ValueCountFrequency (%) 
-1.7 1 < 0.1%
 
-1.31 1 < 0.1%
 
-1.25 1 < 0.1%
 
-1.2 1 < 0.1%
 
-1.19 1 < 0.1%
 
ValueCountFrequency (%) 
684.954409 1 < 0.1%
 
647.7467967 1 < 0.1%
 
560.8769782 1 < 0.1%
 
545.6961459 1 < 0.1%
 
545.658232 1 < 0.1%
 

lat
Real number (ℝ)

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-12.920578356575412
Minimum-13.005500404304225
Maximum-12.68
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB

Quantile statistics

Minimum-13.0055004
5-th percentile-13.0055004
Q1-12.98973907
median-12.9642515
Q3-12.89890347
95-th percentile-12.68
Maximum-12.68
Range0.3255004043
Interquartile range (IQR)0.09083560896

Descriptive statistics

Standard deviation0.1030244293
Coefficient of variation (CV)-0.007973670096
Kurtosis0.3329860998
Mean-12.92057836
Median Absolute Deviation (MAD)0.08301179845
Skewness1.333277238
Sum-4832192.941
Variance0.01061403303
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-13.0055004 -12.99233172 -12.98672925 -12.98086074 -12.97112677 ... -12.92635635 -12.84065173 -12.75769 -12.70649 -12.68 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-12.95380924 43824 11.7%
 
-12.98371943 43680 11.7%
 
-13.0055004 43680 11.7%
 
-12.98973907 43656 11.7%
 
-12.89890347 43608 11.7%
 
-12.7824 32880 8.8%
 
-12.68 32352 8.7%
 
-12.97800204 26256 7.0%
 
-12.99492436 26040 7.0%
 
-12.9642515 26016 7.0%
 
ValueCountFrequency (%) 
-13.0055004 43680 11.7%
 
-12.99492436 26040 7.0%
 
-12.98973907 43656 11.7%
 
-12.98371943 43680 11.7%
 
-12.97800204 26256 7.0%
 
ValueCountFrequency (%) 
-12.68 32352 8.7%
 
-12.73298 12000 3.2%
 
-12.7824 32880 8.8%
 
-12.89890347 43608 11.7%
 
-12.95380924 43824 11.7%
 

lon
Real number (ℝ)

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-38.48889626044166
Minimum-38.54518
Maximum-38.4283765135188
Zeros0
Zeros (%)0.0%
Memory size2.9 MiB

Quantile statistics

Minimum-38.54518
5-th percentile-38.54518
Q1-38.51551
median-38.48717394
Q3-38.46892783
95-th percentile-38.42837651
Maximum-38.42837651
Range0.1168034865
Interquartile range (IQR)0.04658216752

Descriptive statistics

Standard deviation0.03325423365
Coefficient of variation (CV)-0.0008639955124
Kurtosis-0.7469087889
Mean-38.48889626
Median Absolute Deviation (MAD)0.02784474634
Skewness0.2197940664
Sum-14394539.29
Variance0.001105844055
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-38.54518 -38.51944898 -38.51716 -38.51124886 -38.49708083 ... -38.47734722 -38.47214644 -38.46338883 -38.44311317 -38.42837651], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
-38.42837651 43824 11.7%
 
-38.48717394 43680 11.7%
 
-38.50698772 43680 11.7%
 
-38.52008796 43656 11.7%
 
-38.45784983 43608 11.7%
 
-38.51551 32880 8.8%
 
-38.54518 32352 8.7%
 
-38.46892783 26256 7.0%
 
-38.47536505 26040 7.0%
 
-38.4793294 26016 7.0%
 
ValueCountFrequency (%) 
-38.54518 32352 8.7%
 
-38.52008796 43656 11.7%
 
-38.51881 12000 3.2%
 
-38.51551 32880 8.8%
 
-38.50698772 43680 11.7%
 
ValueCountFrequency (%) 
-38.42837651 43824 11.7%
 
-38.45784983 43608 11.7%
 
-38.46892783 26256 7.0%
 
-38.47536505 26040 7.0%
 
-38.4793294 26016 7.0%
 

station
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
PARALELA-CAB
43824
DIQUE DO TORORÓ
43680
RIO VERMELHO
43680
CAMPO GRANDE
43656
PIRAJÁ
43608
Other values (6)
155544
ValueCountFrequency (%) 
PARALELA-CAB 43824 11.7%
 
DIQUE DO TORORÓ 43680 11.7%
 
RIO VERMELHO 43680 11.7%
 
CAMPO GRANDE 43656 11.7%
 
PIRAJÁ 43608 11.7%
 
BOTELHO 32880 8.8%
 
MALEMBÁ 32352 8.7%
 
AV ACM - DETRAN 26256 7.0%
 
ITAIGARA 26040 7.0%
 
AV BARROS REIS 26016 7.0%
 

Length

Max length15
Mean length10.65738305
Min length6
ValueCountFrequency (%) 
Uppercase_Letter 22 91.7%
 
Dash_Punctuation 1 4.2%
 
Space_Separator 1 4.2%
 
ValueCountFrequency (%) 
Latin 22 91.7%
 
Common 2 8.3%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

Date & Time$NO_2$ (µg/m3)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$SO_2$ (µg/m3)CO (ppm)NO (µg/m3)latlonstation
02013-01-02 00:00:0018.90421411.52181126.52.3458360.5310.744251-12.978002-38.468928AV ACM - DETRAN
12013-01-02 01:00:0010.30986213.29935420.13.9156090.520.978220-12.978002-38.468928AV ACM - DETRAN
22013-01-02 02:00:005.44202214.68436124.04.4425170.510.734463-12.978002-38.468928AV ACM - DETRAN
32013-01-02 03:00:004.13154216.06700117.54.7073860.510.490011-12.978002-38.468928AV ACM - DETRAN
42013-01-02 04:00:004.50864716.66039911.74.7089640.520.490175-12.978002-38.468928AV ACM - DETRAN
52013-01-02 05:00:003.38063617.83194614.94.9693240.541.225130-12.978002-38.468928AV ACM - DETRAN
62013-01-02 06:00:0010.09540815.40950511.95.4672290.542.195123-12.978002-38.468928AV ACM - DETRAN
72013-01-02 07:00:0018.19239011.04002419.95.4287620.6010.535112-12.978002-38.468928AV ACM - DETRAN
82013-01-02 08:00:0015.70242717.53964317.25.1451230.5514.460568-12.978002-38.468928AV ACM - DETRAN
92013-01-02 09:00:0020.61892319.39995911.34.8710140.4918.493747-12.978002-38.468928AV ACM - DETRAN

Last rows

Date & Time$NO_2$ (µg/m3)$O_3$ (µg/m3)$PM_{10}$ (µg/m3)$SO_2$ (µg/m3)CO (ppm)NO (µg/m3)latlonstation
3739822020-01-13 15:00:005.2552.618.6-0.41.561.25-12.68-38.54518MALEMBÁ
3739832020-01-13 16:00:004.7939.09.60.61.691.08-12.68-38.54518MALEMBÁ
3739842020-01-13 17:00:004.5532.417.2-1.71.651.52-12.68-38.54518MALEMBÁ
3739852020-01-13 18:00:004.7127.733.8-6.91.531.27-12.68-38.54518MALEMBÁ
3739862020-01-13 19:00:007.1621.025.3-2.60.261.05-12.68-38.54518MALEMBÁ
3739872020-01-13 20:00:005.9119.043.79.80.670.86-12.68-38.54518MALEMBÁ
3739882020-01-13 21:00:009.4313.524.411.60.721.33-12.68-38.54518MALEMBÁ
3739892020-01-13 22:00:0013.629.918.010.20.671.30-12.68-38.54518MALEMBÁ
3739902020-01-13 23:00:0015.239.221.010.40.651.12-12.68-38.54518MALEMBÁ
3739912020-01-13 00:00:0013.1710.311.911.20.621.31-12.68-38.54518MALEMBÁ